Meaning and mining: the impact of implicit assumptions in data mining for the humanities
نویسندگان
چکیده
As the use of data mining and machine learning methods in the humanities becomes more common, it will be increasingly important to examine implicit biases, assumptions, and limitations these methods bring with them. This paper makes explicit some of the foundational assumptions of machine learning methods, and presents a series of experiments as a case study and object lesson in the potential pitfalls in the use of data mining methods for hypothesis testing in literary scholarship. The worst dangers may lie in the humanist’s ability to interpret nearly any result, projecting his or her own biases into the outcome of an experiment – perhaps all the more unwittingly due to the superficial objectivity of computational methods. We argue that in the digital humanities, the standards for the initial production of evidence should be even more rigorous than in the empirical sciences because of the subjective nature of the work that follows. Thus, we conclude with a discussion of recommended best practices for making results from data mining in the humanities domain as meaningful as possible. These include methods for keeping the the boundary between computational results and subsequent interpretation as clearly delineated as possible. A person who is trying to understand a text is always projecting. He projects a meaning for the text as a whole as some initial meaning emerges in the text. Again, the initial meaning only emerges because he is reading the text with particular expectations in regard to a certain meaning. Working out this foreprojection, which is constantly revised in terms of what emerges as he penetrates into the meaning, is understanding what is there. – Hans Georg Gadamer,Truth and Method
منابع مشابه
Data sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملEmploying data mining to explore association rules in drug addicts
Drug addiction is a major social, economic, and hygienic challenge that impacts on all the community and needs serious threat. Available treatments are successful only in short-term unless underlying reasons making individuals prone to the phenomenon are not investigated. Nowadays, there are some treatment centers which have comprehensive information about addicted people. Therefore, given the ...
متن کاملMining and Environmental Degradation: a Gift Brings Grief Scenario for Mining Communities in Sierra Leone
Sierra Leone is blessed with abundant natural resources but yet prone to environmental degradation due to the mining operations. Most often, the mining communities are faced with social tensions, as a result of the possible trade-off between the expected employment impact and the cost of mining operations to the environment. Over the past decades, the contribution of the mining sector to the de...
متن کاملDynamic segmentation and ranking approach of customers and identifying their behavioral mobility using data mining techniques in Kargaran Welfare Bank
Nowadays, identifying, determining the value and segmentation of customers is essential for a bank. Dynamic classification of workers' welfare bank customers and identification of their behavioral mobility between different departments in a specific period of time using data techniques Kaveh. In this regard, transaction data of customers of this bank was considered as a statistical community. I...
متن کاملSemi-quantitative environmental impact assessment and sustainability level determination of coal mining using a mathematical model
Environmental impact assessment (EIA) has led to the dominance of planners on the natural environment of the regions, providing the possibility of continuously monitoring and controlling the status quo by management staff. In this regard, a new semi-quantitative model is presented for the EIA of the Eastern Alborz Coal Mining complex using the matrix method, and determining the corresponding im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- LLC
دوره 23 شماره
صفحات -
تاریخ انتشار 2008